User-friendly biplots in R
Centre for Multi-Dimensional Data Visualisation (MuViSU)
muvisu@sun.ac.za
SASA2024
Aim: Dimension reduction technique that maximises variation between classes while minimising within class variation.
This is achieved by the following tasks:
The classical variance decomposition \[\mathbf{T}=\mathbf{B}+\mathbf{W},\]
has as an analogy in this setting \[ \mathbf{X'X} = \mathbf{\bar{\mathbf{X}}'C \bar{\mathbf{X}}} + \mathbf{X' [I - G(G'G)^{-1}C(G'G)^{-1}G'] X}. \]
The choice of \(\mathbf{C}\) determines the variant of CVA:
Find a linear mapping
\[\mathbf{Y}=\mathbf{X}\mathbf{M}, \tag{1}\]
such that \[\frac{\mathbf{m}'\mathbf{B}\mathbf{m}}{\mathbf{m}'\mathbf{W}\mathbf{m}} \tag{2}\] is maximised s.t. \(\mathbf{m}'\mathbf{W}\mathbf{m}=1\).
It can be shown that this leads to the following equivalent eigen equations:
\[ \mathbf{W}^{-1}\mathbf{BM} = \mathbf{M \Lambda} \tag{3} \]
\[ \mathbf{BM} = \mathbf{WM \Lambda} \tag{4} \]
\[ (\mathbf{W}^{-\frac{1}{2}} \mathbf{B} \mathbf{W}^{-\frac{1}{2}}) \mathbf{M} = (\mathbf{W}^{-\frac{1}{2}} \mathbf{M}) \mathbf{\Lambda} \tag{5} \]
with \(\mathbf{M'BM}= \mathbf{\Lambda}\) and \(\mathbf{M'WM}= \mathbf{I}\).
Since the matrix \(\mathbf{W}^{-\frac{1}{2}} \mathbf{B} \mathbf{W}^{-\frac{1}{2}}\) is symmetric and positive semi-definite the eigenvalues in the matrix \(\mathbf{\Lambda}\) are positive and ordered. The rank of \(\mathbf{B} = min(p, G-1)\) so that only the first \(rank(\mathbf{B})\) eigenvalues are non-zero. We form the canonical variates with the transformation
\[ \bar{\mathbf{Y}} = \bar{\mathbf{X}}\mathbf{M}.\tag{5} \]
The first two canonical variates are given by:
\[\mathbf{\bar{Z}}=\mathbf{\bar{Y}}\mathbf{J}_2=\mathbf{\bar{X}}\mathbf{M}\mathbf{J}_2 \tag{6}\] where \(\mathbf{J'}_2=[\mathbf{I}_2 \quad \mathbf{0}]\). We add the individual sample points with the same transformation \[\mathbf{Z}=\mathbf{X}\mathbf{M}\mathbf{J}_2. \tag{7}\]
A new sample point, \(\mathbf{x}^*\), can be added by interpolation \[\mathbf{z}^*=\mathbf{x}^*\mathbf{M}\mathbf{J}_2.\tag{8}\]
CVA function| Argument | Description |
|---|---|
bp |
Object of class biplot. |
classes |
Vector of class membership. User specified, otherwise defaults to vector specified in biplot. |
dim.biplot |
Dimension of the biplot. Only values 1, 2 and 3 are accepted, with default 2. |
e.vects |
Which eigenvectors (principal components) to extract, with default 1:dim.biplot. |
weightedCVA |
“weighted” or “unweightedCent” or “unweightedI”: Controls which type of CVA to perform, with default "weighted" |
show.class.means |
TRUE or FALSE: Controls whether class means are plotted, with default TRUE. |
low.dim |
"sample.opt" or "Bhattacharyya.dist": Controls method of constructing additional dimension(s) if dim.biplot is greater than the number of classes, with default "sample.opt". |
The means() function allows the user to make adjustments to the points representing the class means.
| Argument | Description |
|---|---|
bp |
an object of class biplot. |
| Argument | Description |
|---|---|
which |
a vector containing the groups or classes for which the means should be displayed, with default bp$g. |
The following arguments control the aesthetic options for the plotted class mean points:
| Argument | Description |
|---|---|
| the colour(s) for the means, with default as the colour of the samples. | |
pch |
the plotting character(s) for the means, with default 15. |
cex |
the character expansion(s) for the means, with default 1. |
opacity |
transparency of means. |
shade.darker |
a logical value indicating whether the colour of the mean points should be made a shade darker than the default or specified colour, with default TRUE. |
The following arguments control the aesthetic options for the labels accompanying the plotted class mean points:
| Argument | Description |
|---|---|
label |
a logical value indicating whether the means should be labelled, with default TRUE. |
label.col |
a vector of the same length as which with label colours for the means, with default as the colour of the means. |
label.cex |
a vector of the same length as which with label text expansions for the means, with default 0.75. |
label.side |
the side at which the label of the plotted mean point appears, with default bottom. |
label.offset |
the offset of the label from the plotted mean point. |
This function creates classification regions for the CVA biplot.
The classify() function appends the biplot object with the following elements:
A confusion matrix from the classification into classes
The classification accuracy rate
A logical value indicating whether classification regions are shown in the biplot
A list of chosen aesthetics for the classification regions
The midpoints of the classification regions
This function creates \(\alpha\)-bags
The alpha.bags() function appends the biplot object with the following elements:
A list of coordinates for the \(\alpha\)-bags for each group
A vector of colours for the \(\alpha\)-bags
A vector of line types for the \(\alpha\)-bags
A vector of line widths for the \(\alpha\)-bags
This function creates \(\kappa\)-ellipses
The ellipses() function appends the biplot object with the following elements:
A list of coordinates for the \(\kappa\)-ellipses for each group
A vector of colours for the \(\kappa\)-ellipses
A vector of line types for the \(\kappa\)-ellipses
A vector of line widths for the \(\kappa\)-ellipses
A vector of \(\alpha\) values
Contains the following information on how well the biplot represents the information of the original and canonical space:
quality: Quality of fit for canonical and original variablesadequacy: Adequacy of original variablesaxis.predictivity: Axis predictivitywithin.class.axis.predictivity: Class predictivitywithin.class.sample.predictivity: Sample predictivityThe summary() function prints to screen the fit.measures stored in the object of class biplot.
# Object of class biplot, based on 50 samples and 8 variables.
# 8 numeric variables.
# 4 classes: Northeast South North Central West
#
# Quality of fit of canonical variables in 2 dimension(s) = 91.9%
# Quality of fit of original variables in 2 dimension(s) = 93.4%
# Adequacy of variables in 2 dimension(s):
# Population Income Illiteracy Life Exp Murder HS Grad
# 0.453533269 0.105327455 0.107221535 0.002201286 0.208653101 0.687840023
# Frost Area
# 0.452308013 0.118544323
# Axis predictivity in 2 dimension(s):
# Population Income Illiteracy Life Exp Murder HS Grad Frost
# 0.9873763 0.9848608 0.8757913 0.9050208 0.9955088 0.9970346 0.9558192
# Area
# 0.9344651
# Class predictivity in 2 dimension(s):
# Northeast South North Central West
# 0.8031465 0.9985089 0.6449906 0.9988469
# Within class axis predictivity in 2 dimension(s):
# Population Income Illiteracy Life Exp Murder HS Grad Frost
# 0.02246821 0.10349948 0.27870637 0.21460313 0.29836047 0.87510975 0.22320989
# Area
# 0.13603927
# Within class sample predictivity in 2 dimension(s):
# Alabama Alaska Arizona Arkansas California
# 0.769417280 0.174566384 0.328610375 0.148035077 0.103141908
# Colorado Connecticut Delaware Florida Georgia
# 0.357627854 0.079176621 0.438089663 0.327270922 0.558038750
# Hawaii Idaho Illinois Indiana Iowa
# 0.029173037 0.167543892 0.076948041 0.473148418 0.592667777
# Kansas Kentucky Louisiana Maine Maryland
# 0.774719240 0.439306768 0.190654770 0.086183357 0.284829878
# Massachusetts Michigan Minnesota Mississippi Missouri
# 0.428103056 0.188094295 0.644844800 0.163103449 0.719255739
# Montana Nebraska Nevada New Hampshire New Jersey
# 0.239142302 0.671350698 0.015766988 0.386053551 0.207503850
# New Mexico New York North Carolina North Dakota Ohio
# 0.012872885 0.008101305 0.872322617 0.457852394 0.092634247
# Oklahoma Oregon Pennsylvania Rhode Island South Carolina
# 0.561156131 0.158926944 0.261838286 0.482912999 0.229047767
# South Dakota Tennessee Texas Utah Vermont
# 0.095865021 0.237667483 0.121494852 0.349495632 0.256983459
# Virginia Washington West Virginia Wisconsin Wyoming
# 0.453608981 0.044780371 0.346223950 0.544998639 0.174849092
The rotate() function rotates the samples and axes in the biplot by rotate.degrees degrees.
The reflect() function reflects the samples and axes in the biplot along an axis, x(horisontal reflection), y (vertical reflection) or xy (diagonal reflection).
TRUE in plot()The argument zoom= is FALSE by default. If zoom=TRUE a new graphical device is launched. The user is prompted to click on the desired upper left hand and lower right hand corners of the zoomed in plot.
The dim.biplot argument can be set to 3 to allow the user to create a 3D biplot. The plot() function makes use of the RGL device for the 3D display.
state.x77 data with class means